The .NET Framework includes both parallel For and parallel ForEach loops and is also implemented in the Parallel LINQ (PLINQ) query language. Use the Parallel.For method to iterate over a range of integer indices and the Parallel.ForEach
method to iterate over user-provided values. Use PLINQ if you prefer a
high-level, declarative style for describing loops or if you want to
take advantage of PLINQ’s convenience and flexibility.
Note: To make for and foreach loops with independent iterations run faster on multicore computers, use their parallel counterparts.
1. Parallel for Loops
Here’s an example of a sequential for loop in C#.
int n = ...
for (int i = 0; i < n; i++)
{
// ...
}
To take advantage of multiple cores, replace the for keyword with a call to the Parallel.For method and convert the body of the loop into a lambda expression.
Note: Don’t forget that the steps of the loop body must be independent of one another if you want to use a parallel loop. The steps must not communicate by writing to shared variables.
int n = ...
Parallel.For(0, n, i =>
{
// ...
});
Parallel.For is a static method with overloaded versions. Here’s the signature of the version of Parallel.For that’s used in the example.
Note: Parallel.For uses multiple cores to operate over an index range.
Parallel.For(int fromInclusive,
int toExclusive,
Action<int> body);
In the example, the first
two arguments specify the iteration limits. The first argument is the
lowest index of the loop. The second argument is the exclusive upper
bound, or the largest index plus one. The third argument is an action
that’s invoked once per iteration. The action takes the iteration’s
index as its argument and executes the loop body once for each index.
The Parallel.For method has additional overloaded versions.
Note: The Parallel.For
method does not guarantee any particular order of execution. Unlike a
sequential loop, some higher-valued indices may be processed before
some lower-valued indices.
The example includes a lambda expression in the form args => body as the third argument to the Parallel.For invocation. Lambda expressions are unnamed methods that can capture variables from their enclosing scope. Of course, the body parameter could also be an instance of a delegate type, an anonymous method (using the delegate keyword) or an ordinary named method. In other words, you don’t have to use lambda
expressions if you don’t want to. Examples in this book use lambda
expressions because they keep the code within the body of the loop, and
they are easier to read when the number of lines of code is small.
2. Parallel for Each
Here’s an example of a sequential foreach loop in C#.
IEnumerable<MyObject> myEnumerable = ...
foreach (var obj in myEnumerable)
{
// ...
}
To take advantage of multiple cores, replace the foreach keyword with a call to the Parallel.ForEach method.
IEnumerable<MyObject> myEnumerable = ...
Parallel.ForEach(myEnumerable, obj =>
{
// ...
});
Parallel.ForEach is a static method with overloaded versions. Here’s the signature of the version of Parallel.ForEach that was used in the example.
Note: Parallel.ForEach runs the loop body for each element in a collection.
ForEach<TSource>(IEnumerable<TSource> source,
Action<TSource> body);
In the example, the first argument is an object that implements the IEnumerable<MyObject> interface. The second argument is a method that’s invoked for each element of the input collection.
The Parallel.ForEach method does not guarantee the order of execution. Unlike a sequential ForEach loop, the incoming values aren’t always processed in order.
The Parallel.ForEach method has additional overloaded versions.
Note:
Don’t forget that iterations need to be independent. The loop body must
only make updates to fields of the particular instance that’s passed to
it.
3. Parallel Linq (PLINQ)
The Language Integrated Query
(LINQ) feature of the .NET Framework includes a parallel version named
PLINQ (Parallel LINQ). There are many options and variations for
expressing PLINQ queries but almost all LINQ-to-Objects expressions can
easily be converted to their parallel counterpart by adding a call to
the AsParallel extension method. Here’s an example that shows both the LINQ and PLINQ versions.
Note: You can convert LINQ expressions to parallel code with the AsParallel extension method.
IEnumerable<MyObject> source = ...
// LINQ
var query1 = from i in source select Normalize(i);
// PLINQ
var query2 = from i in source.AsParallel()
select Normalize(i);
This code example creates two queries that transform values of the enumerable object source. The PLINQ version uses multiple cores if they’re available.
You can also use PLINQ’s ForAll
extension method in cases where you want to iterate over the input
values but you don’t want to select output values to return. This is
shown in the following code.
Note: It’s important to use PLINQ’s ForAII extension method instead of giving a PLINQ query as an argument to the Parallel.ForEach method. For more information, see the section, “Mixing the Parallel Class and PLINQ,” later in this chapter.
IEnumerable<MyObject> myEnumerable = ...
myEnumerable.AsParallel().ForAll(obj => DoWork(obj));
The ForAll extension method is the PLINQ equivalent of Parallel. ForEach.
4. What to Expect
By default, the degree
of parallelism (that is, how many iterations run at the same time in
hardware) depends on the number of available cores. In typical
scenarios, the more cores you have, the faster your loop executes,
until you reach the point of diminishing returns that Amdahl’s Law
predicts. How much faster depends on the kind of work your loop does.
Note: Adding cores makes your loop run faster; however, there’s always an upper limit.
The .NET implementation of the Parallel Loop pattern ensures that exceptions that are thrown during the execution of a loop body are not lost. For both the Parallel.For and Parallel.ForEach methods as well as for PLINQ, exceptions are collected into an AggregateException
object and rethrown in the context of the calling thread. All
exceptions are propagated back to you.
Note:
You must choose the correct granularity. Too many small parallel loops
can reach a point of over-decomposition where the multicore speedup is
more than offset by the parallel loop’s overhead.
Parallel loops have many variations. There are 12 overloaded methods for Parallel.For and 20 overloaded methods for Parallel. ForEach. PLINQ has close to 200 extension methods. Although there are many overloaded versions of For and ForEach,
you can think of the overloads as providing optional configuration
options. Two examples are a maximum degree of parallelism and hooks for
external cancellation. These options allow the loop body to monitor the
progress of other steps (for example, to see if exceptions are pending)
and to manage task-local state. They are sometimes needed in advanced
scenarios.
Note: Robust exception handling is an important aspect of parallel loop processing.
If you convert a
sequential loop to a parallel loop and then find that your program does
not behave as expected, the mostly likely problem is that the loop’s
steps are not independent. Here are some common examples of dependent loop bodies:
Note: Check carefully for dependencies between loop iterations! Not noticing dependencies between steps is by far the most common mistake you’ll make with parallel loops.
Writing to shared variables.
If the body of a loop writes to a shared variable, there is a loop body
dependency. This is a common case that occurs when you are aggregating
values. Here is an example, where total is shared across iterations.
for(int i = 1; i < n; i++)
total += data[i];
Shared
variables come in many flavors. Any variable that is declared outside
of the scope of the loop body is a shared variable. Shared references
to types such as classes or arrays will implicitly allow all fields or
array elements to be shared. Parameters that are declared using the
keyword ref result in shared variables. Even reading and writing files can have the same effect as shared variables.
Using properties of an object model.
If the object being processed by a loop body exposes properties, you
need to know whether those properties refer to shared state or state
that’s local to the object itself. For example, a property named Parent is likely to refer to global state. Here’s an example.
for(int i = 0; i < n; i++)
SomeObject[i].Parent.Update();
In this example, it’s likely that the loop iterations are not independent. For all values of i, SomeObject[i].Parent is a reference to a single shared object.
Referencing data types that are not thread safe. If the body of the parallel loop uses a data type that is not thread safe, the loop body is not independent
(there is an implicit dependency on the thread context).
Note:
You must be extremely cautious when getting data from properties and
methods. Large object models are known for sharing mutable state in
unbelievably devious ways.
Loop-carried dependence. If the body of a parallel for loop performs arithmetic on the loop index, there is likely to be a dependency that is known as loop-carried dependence. This is shown in the following code example. The loop body references data[i] and data[i – 1]. If Parallel.For is used here, there’s no guarantee that the loop body that updates data[i – 1] has executed before the loop for data[i].
for(int i = 1; i < N; i++)
data[i] = data[i] + data[i - 1];
Sometimes, it’s possible to
use a parallel algorithm in cases of loop-carried dependence, but this
is outside the scope of this book. Your best bet is to look elsewhere
in your program for opportunities for parallelism or to analyze your
algorithm and see if it matches some of the advanced parallel patterns
that occur in scientific computing. Parallel scan and parallel dynamic
programming are examples of these patterns.
Note: Arithmetic on loop index variables, especially addition or subtraction, usually indicates loop-carried dependence.
When you look for
opportunities for parallelism, profiling your application is a way to
deepen your understanding of where your application spends its time;
however, profiling is not a substitute for understanding your
application’s structure and algorithms. For example, profiling doesn’t
tell you whether loop bodies are independent.
Note: Don’t expect miracles from profiling—it can’t analyze your algorithms for you. Only you can do that.